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ABSTRACT 


Student procrastination, as the voluntary delay of intended 
work despite expecting to be worse off for the delay, is an 
important factor with potentially negative consequences in 
student well-being and learning. In online educational set- 
tings such as Massive Open Online Courses (MOOCs), the 
effect of procrastination is considered to be even more preva- 
lent and detrimental, as online courses are often self-paced 
and self-directed, where higher levels of self-regulated learn- 
ing are expected from the students. Past research has mainly 
described students’ procrastination by either static time- 
related measures (e.g. averaged starting time over all as- 
signments per student), or by temporal models’ parameters, 
under the assumptions that student activities take place at a 
constant rate (e.g. Homogeneous Poisson models), and that 
student interactions with one learning material are indepen- 
dent of interactions with another. In this work, we propose 
to consider the interdependence between the students’ tem- 
poral activities while modeling their sequences in a continu- 
ous time scale. To this end, we propose to model the interac- 
tion sequence between each student and each course module, 
i.e. each module-student pair, as Multi-dimensional Hawkes 
processes, which not only capture the relationship between 
students’ learning activities and their exogenous stimuli such 
as assignment deadlines, but also capture the endogenous 
responses within and between types of learning materials. 
Our experiments show that not only there exists dependen- 
cies between students’ historical activities and the future 
ones when different types of learning materials are involved, 
such dependencies also provide meaningful interpretations 
in terms of students’ procrastination behaviors. Further- 
more, our findings show that in addition to association with 
delay, the parameters learned by multi-dimensional Hawkes 
processes provide more procrastination-related information 
and can improve our explanation of student grades. 
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1. INTRODUCTION 


Student academic procrastination has shown to have nega- 
tive effects on students’ learning and well-being. Procrasti- 
nation is prevalent in different academic settings like tra- 
ditional classrooms, and could be even more widespread 
in online learning environments, as higher levels of time- 
management and self-regulated learning (SRL) skills are re- 
quired [47] [3] [76]. To describe and measure student procras- 
tination, past research has been mainly relying on either self- 
reported surveys (e.g. [27|) or time-related features that are 
associated with students dilatory behaviors (e.g. [10]). As 
procrastination is inherently subjective, self-reported sur- 
veys have been heavily used in earlier research, to differenti- 
ate procrastinators and non-procrastinators by emphasizing 
on measuring the perceptions of the students. Although 
self-report survey measures capture students’ retrospective 
reports of their studying and delaying behaviors, they are 
administered in a cross-sectional manner, rely on students’ 
memory, and are usually static point estimates that summa- 
rize students’ average degree of procrastination. 


Considering the noises of self-reported data [37], in more re- 
cent studies, more focus has been given to the behavioral side 
of the procrastination, where time-related measures were 
proposed and used as the representation of students’ pro- 
crastination. For example, measures such as students’ av- 
erage delays in starting coursework, the average time they 
spent in doing assignments, students’ average paces of view- 
ing lectures have been studied as factors of procrastina- 
tion [6| [22]. However, these measures lack the 
ability to describe students’ continuous behaviors within a 
period of time. An analogy to such methods is to describe 
the entire distribution using the sample mean, without fully 
knowing the distribution. To tackle this limitation, more 
recently, emphasis has been on modeling the time points of 
student activities that are extracted from students’ learning 
trajectory data (e.g. log or click-streams of student histor- 
ical actions), via stochastic models. For example in [33], 
Park et al. modeled students’ per-day activity counts dur- 
ing each week of the course via a Poisson mixture model, 
which models the entire trajectory of each student activities 
during a weekly module. Other factors that have been con- 
sidered to be important in describing procrastination in the 
past research are the effects of different learning materials 
(e.g. forums and quizzes) as well as students’ interactions 
with them (e.g.[]] [28)). However, to the best of our knowl- 
edge, no past work has considered the possible time depen- 
dencies within and between students’ interactions with dif- 


Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020) 280 


ferent learning material types. For example, viewing video 
lectures more intensively mostly before the first attempt of 
an assignment may suggest that a student prefer to learn the 
materials first before trying the assignment. On the other 
hand, watching lecture videos dominantly after the first at- 
tempt of an assignment may suggest that the student prefer 
to try the assignment first and then go through the video 
lectures if they encountered any problems. 


To summarize, past research has attempted to describe pro- 
crastination using static time measures, or measures sum- 
marized from more sophisticated temporal models, based 
on students’ interactions with one or more learning materi- 
als. However, two important factors of student behaviors 
and their association with procrastination have not been 
fully explored: (1) the dependencies between students’ past 
and future interactions within each learning material type 
(e.g. knowing a student has looked at lecture slides at some 
time, how and when are they going to have the next ac- 
tivity?) and (2) the dependencies between students’ inter- 
actions with different types of learning materials (e.g. are 
watching video lecture usually followed by a submission of 
an assignment?) In this work, we aim to address these two 
factors by answering the following questions: within each 
learning module, that is the unit of a course that learn- 
ing materials are provided, (Q1) are the past activities in- 
dependent of future ones? Or some activities can trigger 
other ones to arrive within a short period of time (i.e. time 
dependencies between activities)? And (Q2), are students’ 
interactions with one type of learning material (e.g., video 
lectures) independent from another type (e.g., discussion fo- 
rums)? Furthermore, (Q3) if such dependencies exist, how 
are they associated with student procrastination? (i.e. the 
dependencies between a student’s past and future activities 
as well as dependencies a student’s interactions with one 
learning material with another.) 


As a result, our goal is to find the missing link between stu- 
dents’ procrastination and students’ activities within and 
between different types of online learning materials. To 
achieve this goal, we propose to use multi dimensional Hawkes 
processes as a powerful tool that addresses the above men- 
tioned concerns in student procrastination analysis. Par- 
ticularly, we represent all activities on one type of learning 
material as one dimension in the multi-dimensional Hawkes 
model. We show that this model better fits our data, in 
comparison to baseline temporal processes. Also, to answer 
Q1 and Q2, we demonstrate that it can capture both stu- 
dents’ reactions to the deadlines as action-triggering factors 
that come externally (i.e. exogenous stimuli), and students’ 
responses to the previous interactions with different types of 
learning materials, such as video lectures, assignments, and 
discussions (i.e. self-excitement). By doing so, we can un- 
derstand students’ procrastination behavior from a stochas- 
tic process point of view, with two main stimuli: (1) some 
of the students’ activities can be viewed as a response to 
an external stimulus, e.g. deadlines of the assignments (2) 
some other student activities can be viewed as the results of 
previous interactions that the student had with the same or 
other learning material types. Based on the model parame- 
ters, to answer Q3, we also propose a measure that not only 
describes student procrastination but also is able to explain 
student performance better than the static delay measure. 


The outline of this paper can be summarized as follows: 
In Section |2| we go over three main bodies of the related 
work; in Section |4] we go over the details of the dataset 
that we use; in Section [4] we provide the intuition of using 
the Hawkes model, then statistically and visually show that 
a Hawkes process is a proper choice for modeling module- 
student interactions; in Section |5| we formally define our 
problem and introduce the multi-dimensional Hawkes model 
that we use in this study. We perform various experiments 
in section [6| to analyze the model parameters, explain their 
interpretation, and associate them with procrastination as 
well as students’ assignment grades. Finally, the conclusion 
of this work is summarized in Section [7] 


2. RELATED WORK 


Students’ procrastination In the past research on stu- 
dent procrastination, the main focus has been on the mea- 
sures that capture either students’ perceptions (e.g.  self- 


reported surveys on procrastination 23), 


or static measures that describe students’ dilatory behav- 
iors as the representation of procrastination 
[33]. For example, in [10], Cerezo et al. studied 140 un- 
dergraduate and used measures such as students’ delay and 
time-spent variables to describe procrastination. For an- 
other example, in I, Asarta and Schmidt studied students’ 
behaviors in accessing lecture notes of a blended-learning 
course, and proposed to use features such as pacing, anti- 
cramming, and consistency in reviewing course materials. 
A few recent works have tried to model student activities 
to provide a temporal perspective of procrastination behav- 
ior. For example, Backhage et al. proposed a model that 
captures procrastination-deadline cycles of all students in 
the course using a stochastic temporal model [4]. However, 
this model assumes that all students follow the same pro- 
crastination behavior during the course and does not distin- 
guish the differences between student behaviors. In [33], 
Park et al. assumed that students’ daily activity counts 
follow a mixture Poisson distribution, which is a mixture 
of a procrastination component and a non-procrastination 
component. Particularly, by assuming the independence be- 
tween students’ past and future activities, they proposed to 
model each day of the week by a Poisson with a constant 
rate for all weeks. In the end, they described procrastina- 
tors as the ones with a dominant procrastination component 
versus the non-procrastination one, i.e. the students who 
have a fast-increasing activity counts towards the end of the 
week. Moon et al. assumed that procrastination behavior 
over time can be described by a curvilinear growth curve 
and modeled it using latent growth curve modeling. To val- 
idate this assumption, they compared the curvilinear model 
with a non-growth and a linear model and showed that their 
model has a better goodness-of-fit than the baseline mod- 
els [30]. In contrast with the existing research that either 
uses Summary variables or ignores the dependence between 
different student activities, in this paper we aim to model 
the temporal activity interrelationships and associate them 
with student procrastination. 


Modeling students’ engagement using their learning 
trajectories. Other relevant studies to our work are the 
ones that model student learning trajectories to understand 
other aspects of their behaviors, such as student engage- 
ment in online learning environments. While many past 
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studies focused mostly on utilizing cumulative factors such 
as frequency of watching videos or using discussion forums 
[13], more recent work attempted to build more 
complex models of student behaviors. For example, in (46), 
Zhu et al. constructed students’ social connection networks 
based on students’ weekly post-reply dynamics, along with 
node attributes, such as assignment scores. Particularly, 
they used an exponential random graph model to compute 
the structural features of the social connection networks, 
to understand the relationship between students’ engage- 
ment in the forums and their performances in the assign- 
ments. In another example, Lan et al. proposed a sta- 
tistical model, which consists of two components: a learn- 
ing model and a response model [25]. These two models 
represent nine behavioral features extracted from students’ 
video-watching clickstreams and in-video quiz responses in 
one MOOC course, with the aim to find the behavioral fea- 
tures that lead to high levels of student engagement. Simi- 
larly, Kizilcec et al. classified students’ behaviors based on 
binary features extracted from students’ log data (1 if a stu- 
dent had any activities that are associated with a learning 
material, 0 otherwise, for all learning materials) [24]. Asa 
result, they identified 4 behavioral types: completing, audit- 
ing, disengaging, and sampling, from these binary features. 
For another similar example, Gelman et al. extracted stu- 
dent features from students’ log data as well and applied 
Nonnegative Matrix Factorization to find 5 types of stu- 
dent behaviors - deep, consistent, bursty, introduction, and 
sampling [18]. Particularly, the authors used a procrastina- 
tion indicator as a feature, that is, the average amount of 
time left before the deadline when a student submits their 
assignments. In summary, past research on modeling stu- 
dent engagement is similar to our study in the sense that 
the models utilize students’ activities that were extracted 
from students’ log data. However, it differs to ours in the 
following two ways: (1) the models usually define students’ 
engagement levels based on the counts of students’ historical 
actives, without directly modeling students’ learning trajec- 
tories as stochastic processes, (2) the aim is usually to model 
or predict students’ future engagement levels, rather than 
studying students’ procrastination and its association with 
students’ performance. 


Hawkes in education. Hawkes processes, a family of 
stochastic point processes, have been frequently used to model 
complicated time-stamped events in continuous time. Due 
to Hawkes process’s capability to model scenarios where 
historical events influence future activities, it has been fre- 
quently used in finance |5} and seismology and has been 
gradually becoming a useful modeling tool in the domain 
of social media [29], as well as recommendation sys- 
tems . In the education domain, a few works 
have used Hawkes processes so far, especially to model so- 
cial and interaction data among students [26]. For 
example, Lan et al. proposed a single-dimensional Hawkes 
model to recommend relevant discussion threads to students 
according to their historical interactions with course forums. 
In a similar application, a Hawkes model is suggested by Von 
Davier et al. to model the collaboration dynamics between 
students within and between groups [42]. Along this line, 
Halpin et al. used multi-dimensional Hawkes processes to 
understand students’ collaboration with each other [20]. 
Another interesting application of the Hawkes process in the 


education domain is the work by Boerner et al. that ana- 
lyzed the association between student skills and the skills 
required by professional jobs (8). In another recent work, 
Cai et al. used Hawkes processes as a step in their model to 
predict which video a student will watch next based on their 
historical interactions with the videos in an edX course [9]. 
They use long and short multi-dimensional Hawkes processes 
that differentiates the long-term and short-term temporal 
dependencies between video-watching actions. None of the 
above works uses the Hawkes processes to model the pro- 
crastination behavior, nor considers course deadlines and 
milestones in their application of the Hawkes process. 


3. DATASET 

Our dataset is publicly collected from the Canvas N' etwork!] 
MOOC platform [31], which is an online platform that hosts 
various open online courses in different academic disciplines, 
such as Computer science, Social Science, and Business man- 
agement. These courses have multiple types of learning re- 
sources, including Wiki pages, assignments (or quizzes), and 
discussions. Assignments can be quiz-style or in a longer 
format where students need to upload a file to complete the 
submission. Each learning module is associated with one 
Wiki page. In total, CANVAS data contains 389 anonymized 
courses where the names of students and courses along with 
the contents of discussions and assignment (or quiz) submis- 
sions are not available. 


In this work, we mainly focus on exploring the student learn- 
ing trace data. Specifically, we select a computer science 
course (course id: 770000832960058) that best fits the fol- 
lowing criteria: (1) having a large number of student”| 
(2) including multiple types of learning materials (such as 
video lectures, assignments, discussions); and (3) containing 
a large number of student historical learning activities. To 
obtain student learning activity data, we use Canvas logs 
files (Pageview requests). We divide the learning activities 
into three types. Specifically, we consider viewing the lec- 
tures, downloading the attached files, and previewing the 
attached files as the activities associated with video lectures 
(L). Activities that include viewing, creating, saving, up- 
dating, and submitting each assignment attempt are asso- 
ciated with assignments (A). Finally, we consider reading 
(marking as read), subscribing, creating, replying, and edit- 
ing discussion entries, discussion topics, and direct messages 
as discussion-related activities (D). 


We separate the data in module-student pairs, as we aim 
to model each student’s interactions with each individual 
learning module. As each module has its specific deadlines 
according the course design, we choose each module rather 
than the whole course as the unit of our study. Also, differ- 
ent modules usually have different learning objectives, which 
will possibly trigger different behaviors. By doing so, we are 
able to capture a finer granularity of the data. Finally, we 
have 731 students and ~ 946K learning activities in the se- 
lected course. 


‘http://canvas.net 

?Enrolled students who have missed more than 50% of the 
assignment submissions during the courses, along with those 
who did not receive a final grade, are considered as dropouts 
and are disregarded in this study. 
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4. BACKGROUND: HAWKES PROCESS AS 
A FIT TO STUDENT ACTIVITIES 


Since we want to study the interactions between students 
and modules from a temporal aspect, point processes are 
one of the best choices for our application. Additionally, be- 
cause of the interaction irregularities in our application, we 
must select a point process that can handle this type of infor- 
mation. Specifically, past studies have shown that students’ 
activities can take place in an irregular manner during var- 
ious periods of a course, particularly affected by milestones 
such as assignment deadlines and exam dates [16]. As a re- 
sult, the point processes that follow a constant rate, such as 
Poisson processes, are not the appropriate model for our ap- 
plication. In Poisson processes, the main assumption is the 
independence between past and future occurrences of the 
events, which can not be met in student studying behaviors. 
Not only some student activities happen in response to the 
course milestones, but also a part of these activities can be 
interrelated with each other. For example, a student whose 
goal is to start discussing a topic in the discussion forum may 
watch a video lecture about the same topic before posting 
in the forum. To meet the temporality and interdependence 
assumptions of our application, we choose to model student 
activities during course modules with Hawkes processes. 


One of the most important properties of the Hawkes pro- 
cess is its ability to deal with the interrelationships between 
future and past activities. This is in contrast with the mem- 
oryless Poisson process where all activities are assumed to be 
independent of each other. More importantly, the Hawkes 
process allows the activities to be excited both exogenously 
(by external stimuli, similar to the Poisson process) and en- 
dogenously (self-excitement, by internal stimuli). In other 
words, the Hawkes process has a branching process point of 
view. It assumes that some activities arrive as a result of 
exogenous stimuli (i.e. immigrant activities). Then, the im- 
migrant activities can trigger their following activities (i.e. 
offspring activities), and those offspring activities can fur- 
ther trigger their own offspring activities, and so on. That 
is, the offsprings of an immigrant activity are structured into 
a latent cluster because they are all triggered by the same 
immigrant and arrive more closely to each other than the 
activities that are in other clusters. 


As a result, the Hawkes process can capture more informa- 
tion than the Poisson process or other point processes that 
use the average base rate as the only model parameter. This 
can be very helpful when modeling processes that have the 
same number of activities, but with different activity occur- 
rence distributions. To demonstrate this ability in Hawkes 
processes, we show the event occurrence patterns of two sim- 
ulated Hawkes processes with the same number of activities, 
but different parameters in F igure[]] We can see that process 
1 has more bursty but less regular occurrences compared to 
process 2, in which less burstiness but a higher regularity is 
observed. Since both simulated Hawkes processes have the 
same number of activities in the history, a Poisson model is 
not able to capture such differences because the base rates 
of the two processes would be the same in a Poisson process. 


For an educational application, there is a natural mapping 
between student activity events and Hawkes processes. The 
smaller student activity chunks toward a goal or deadline can 
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Figure 1: An illustration of two different processes 
that have the same number of occurrences. The x- 
axis is the time and the y-axis is the intensity of 
event occurrences per time unit. Both processes 
have 29 occurrences but with very different charac- 
teristics that will be ignored by Poisson processes. 


be examples of an immigrant’s offsprings: students break 
down the big tasks (the whole process) into small sub-tasks 
(latent clusters). The deadline (external stimuli) of a big 
task, such as an assignment deadline, can trigger subsequent 
activities that are associated with the small tasks. These ac- 
tivities arrive closely one after another in a so-called bursty 
manner (self-excitement) We demonstrate that Hawkes 
processes are a good fit to our application by showcasing 
two examples. First, we show that the module-student pair 
interactions can not be properly modeled by processes that 
only model an average base rate, such as Poisson processes. 
To do this, we conduct a goodness of fit test on the inter- 
arrival times of module-student pairs in our dataset against 
the inter-arrival time distribution of a Poisson process, which 
is exp(1). We use the Kolmogorov-Smirnov test to evaluate 
the fit’s significance. The mean p-value of this test among 
all module-student pairs in our dataset is 2.77E — 6 with a 
standard deviation of 6.41E — 5, which shows that module- 
student pairs do not fit Poisson processes. 


Second, we empirically demonstrate the burstiness of module- 
student interactions. To do this, we show that the Pois- 
sonian property of only having a constant base rate is not 
present in the observed activities of a sample module-student 
pair from our dataset. Specifically, we use the l-lag au- 
tocorrelation of activity inter-arrival times to conduct our 
test. The inter-arrival time is defined as the difference be- 
tween the arrival times of two consecutive activity occur- 
rences. We first simulate a Poisson process with the base 
rate equal to the average number of activities in our sample 
activity sequence. Then, we compare the 1-lag autocorrela- 
tion in this simulated sequence with the autocorrelation of 
our sample sequence. Since all inter-arrival times in Pois- 
son processes follow exp(1), we expect the autocorrelation 
of the simulated Poisson process to be 0 (no correlation). 
In contrast, we expect to see a non-zero autocorrelation in 


3It is worth noting that in regular applications of Hawkes 
processes an activity at time ¢ can trigger later activities at 
times 7 > t. However, in our application, student activities 
are triggered by the upcoming deadlines in the future. Sim- 
ilarly, earlier chunks of studying sub-tasks at times 7 < t 
can be offsprings of future studying tasks at time t towards 
a deadline. As a result, to make the Hawkes process appli- 
cable to our problem, we use a reversed activity timeline for 
our data. This does not affect our model, optimization, or 
learned parameters. 
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a bursty self-exciting sequence. Figure. [2] shows the scatter 
plot of activity inter-arrival times in the original sequence vs. 
the sequence with lag 1 for each of the two sequences. As we 
can see, little autocorrelation is spotted in the Poisson pro- 
cess, whereas the pattern of autocorrelation in real data is 
shown to be not random. Specifically, we can see that most 
of the lag-1 vs. original inter-arrival times for the sample 
sequence are scattered around the axes, meaning that dense 
activities are often followed by long pauses, and vice versa. 
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Figure 2: A demonstration of burstiness presented 
in the interactions of a sample module-student pair: 
1-lag autocorrelation scatter plots shows that long 
pauses are often observed after dense and bursty 
activities. 


It is worth mentioning that our goal is not to directly com- 
pare to Poisson models. Rather, we are demonstrating here 
that we must model the data in a way that captures long- 
term temporal properties of the processes and their irreg- 
ularities, rather than static measures such as the count or 
average number of activities that only provide one facet of 
the whole picture. 


5. METHOD: MULTI-DIMENSIONAL 
HAWKES PROCESSES TO MODEL 
ACTIVITY-TYPE RELATIONS 


In this section, we introduce the method we use in this study 
to model student behaviors. More specifically, we illustrate 
multi-dimensional Hawkes processes and how we apply them 
to our application. The previous section illustrated how 
Hawkes process is a good fit for student activities as fu- 
ture activities in module-student pairs could be related to 
the past activities. In those illustrations, all activities in a 
module-student pair are considered to be homogeneous or 
of one single type. In other words, the self-exciting prop- 
erty of the interactions between students and module are 
assumed to be uniform throughout different kinds of activ- 
ities, whether it is watching a video lecture, participating 
in a discussion, or attempting to submit a solution to an 
assignment. However, in reality, students might exhibit dif- 
ferent learning behaviors or use different learning strategies 
towards different types of learning materials. For example, 
some students may have more intense and frequent activities 
when viewing module lectures but less frequent pace when 
it comes to the discussions. Furthermore, when a student 
is interacting with two different types of learning materials, 
different time dependencies may exist between student’s in- 


Figure 3: Hawkes processes in module-lecture di- 
mension L, module-assignment dimension A, and 
module-discussion D, and their mutual excitation. 
A vertical bar is the representation of an activity 
occurrence and a red arrow shows the influence of 
one activity (head) on another (end). 


teractions with the two. For example, a student may of- 
ten visit discussion forums very closely after viewing lecture 
slides, i.e. strong time dependency between lectures and 
discussions (more specifically, discussion after lectures), but 
such dependencies may be less obvious for another student. 


To address this challenge, we model the students’ activities 
on one type of learning material as an individual Hawkes 
process, and model all such processes simultaneously as multi- 
dimensional Hawkes Processes. In particular, in the rest of 
this study, we refer the collection of activities that asso- 
ciate with one type of learning material as a Hawkes process 
dimension. A multi-dimensional Hawkes model not only al- 
lows dependency between past and future activities within 
each dimension (i.e. self-excitation) to be modeled, it is also 
able to capture the possible dependencies between different 
types of activities (i.e. excitation between dimensions). For 
example, scenarios such as submitting the first attempt of 
an assignment and then starting the second attempt (self- 
excitation), or, posting a question in the discussion forum 
after watching a video lecture (excitation between dimen- 
sions) can be well described by multi-dimensional Hawkes 
processes. 


In this study, based on the learning material types presented 
in our dataset, we consider 3 dimensions to analyze stu- 
dents’ learning behaviors, namely video lecture dimension 
L, assignments dimension A and discussions dimension D. 
To illustrate how multi-dimensional Hawkes processes work 
in modeling between-dimension excitation, in Figure [3] we 
show 3 sample Hawkes processes that respectively comes 
from dimensions L, A, and D. Within each dimension, we 
use vertical dashed lines to represent the occurrences of ac- 
tivities that take place in that dimensio Activities in one 
dimension can trigger other activities in another dimensions. 
This constitutes the influence between different dimensions. 
We indicate the between-dimension triggers by the red ar- 
rows that point from the parent activity to the offspring. 
For example, the third activity in dimension D in this figure 
triggers the fifth activity in dimension A as well as the sixth 
activity in dimension L. 


We now formally explain the multi-dimensional Hawkes model 
and how it can be interpreted according to our application. 
Suppose that for each module-student pair (m,u), we are 


‘The height of each bar does not represent intensity and 
does not have any particular meaning in this figure 
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given a sequence of arrival times for Nm. number of activi- 
ties that are associated with module m and student u. We 
represent the sequence of each module student pair as in 
(m,u) = {ri} Xm", where 7; = (ti, di) corresponds to the ar- 
rival time of i” activity and the dimension (activity type) d: 
to which activity 7 belongs. For example, suppose student wu 
has 3 total activities in module m. If u submitted an assign- 
ment at time 1, then checked a lecture’s slides at time 5, and 
had some discussion posted at time 8, then (m, wu) = {(t1 = 
ld, = A), (ta = 5,d, = L), (ts = 8,d3 = D)}, with A, 
L and D representing assignments, video lectures, and dis- 
cussions respectively. For each dimension d € [L, A, D] and 
each module-student pair (m, u), we further use the sequence 
Ta(ti) = {ti € Ti|T: € (m,u), di = d}, to represent the type 
d learning activities that student wu performs in module m 
as a process. According to the multi-dimensional Hawkes 
model, we can explain the intensity of Ta(7;) according to 
the following function: 


Aa(t) = wa + ? gaa’ (t — t;), (1) 


d’,tj<t 


where jug describes the average number of activities occurred 
per unit time that are triggered by exogenous stimuli (the 
process’s base rate in dimension d); and ¢ (the kernel func- 
tion) represents the function that explains the endogenous 
stimuli, or the triggering effects from the previous (t; < t) 
activities in the same dimension or another dimension (d’). 
In other words, ¢gg controls the total influence that dimen- 
sion d exerts on dimension d’, as a function of activity inter- 
arrival times (t — ¢;). Using an exponential kernel function 
for ¢@, the multi-dimensional Hawkes model can be rewritten 
as in Equation [2] 


ral Oriish > mat 2 Pete 


ti<t 


B(t—tj)). (2) 


The term aqq and the term 6 exp(—{(t — t;)) can be con- 
sidered as the decomposition of kernel function @gg, which 
respectively describe the influence weight of dimension d 
on dimension d! (including aga, the self-excitation of di- 
mension d itself) and an exponential decay function g(t) = 
Bexp(—A(t)). Putting together all activity types’ parame- 
ters, we use a d-dimensional vector j4 = [j1a] to represent the 
base rates of the processes in all dimensions, and a d x d ma- 
trix ® = [daa] to represent the between and within dimen- 
sion triggering effects. From here, we can write ® = 10G, 
where we have influence matrix I = [aaa] and exponen- 
tial decay kernel G = yo, <e(-B exp(—Bt —t,;))]. Based 
on that, we can also describe the aggregated influence of di- 
mension d on other dimensions using the following equation: 


“= ay oe " 


which is simply the average influence of dimension d over 
all dimensions. A summary of all notations used so far are 
shown in Table[]] 


This intensity function Aq(t) has an intuitive meaning: all 
the future activities in dimension d, apart from those that 
are triggered by external stimuli, can be triggered by the 
previous activities that belong to each of the dimensions d’ 
(including d itself) according to the influence weight agq 
(the outer summation). The ones that are triggered by ex- 


Notation Description Formula 
L Dimension module lectures 
A Dimension module assignments 
D Dimension module discussions 
(tj, di) activity 7 in dimension d; 
Ti arrival time (ti, di) 
(m, u) module m, student u pair {ti} 
Ta(Ti) activities in dimension d {ti € rid; = d} 
La base rate in dimension d 
Qda! influence of d to d’ 
Qa Influence of dimension d aH Yo gr Maa’ 
B decay parameter 
g(t) decay kernel function Bexp(—f(t)) 
cau (t) Hawkes kernal function Qaar g(t) 
influence matrix Co 
G decay kernel matrix de, ( —Bexp(—A(t — t;))] 
® Hawkes kernel matrix IoG 
Xa intensity in d Equation[1] 


Table 1: Notations and their descriptions. 


ternal stimuli take places with rate a. Furthermore, as a 
past activity becomes distant (larger t — t;), its effect on 
the occurrence probability of a new event decreases expo- 
nentially (i.e. the inner summation). From the branching 
process point of view, the kernel function ¢gq is designed in 
this way so that agq is the branching ratio. By computing 
Toa,,7? We can obtain the expected number of future activ- 
ities in dimension d’ that are triggered by an immigrant in 
dimension d. This represents the size of an offspring cluster. 


To avoid possible confusions, we also want to clarify that by 
saying one activity i in dimension d’ triggers another activity 
j in dimension d, we mean that the probability of activity 7 
in the result of activity 7 is higher than 7 coming from base 
rate jtq or triggered by other activities. To see this, one can 
interpret Equation [I] as follows: in dimension d, a sequence 
of activities come from the base rate zg and each summation 
leads to a sequence of activities with parameter ¢gq. Then, 
the probability of 7 being triggered by 7 is 


gaat (tj — ti) 
ner Ditact, daa (tj — ti)” (4) 


P(j child of i) = 


Parameter Estimation. A common way to find the best 
parameters of Hawkes model, given the observed activity ar- 
rival times, is to minimize the negative log-likelihood of the 
data. Particularly, given the sequence {(ti,di),...(tw,dn)} 
till some time T,, the log-likelihood of having influence ma- 
trix I and base rate vector py is of the following form: 


= Yon a + > aaa’ g( ti—tj)) —T > wa 
d=1 


tj<t 
T-t; 
Le > Qdd’ ) WT = t;)dt;. (5) 
d d 0 


In order to find Hawkes parameters that models each module- 
student pair, we adopted algorithm ADM4 [45], which made 
use of a mix of Lasso and nuclear regularization on top of 
the negative log-likelihood. Specifically, Accelerated Pro- 
jected Gradient Descend method was used to meet the non- 
negative constraints on J and yu as Hawkes parameters only 
have realistic meanings when the parameters are non-negativd?| 
When it comes to the selection of global parameter {, for 


>We made our implementation available at 
https://github.com/ssahebi/EDM2020-Hawkes 
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each module-student pair, we use grid search with cross val- 
idation on the interval [0,10] with step size 1. 


6. EXPERIMENTS 


6.1 Testing the Goodness of Fit 

To test the goodness-of-fit of the model, in Table[2] we com- 
pare the RMSE of the intensity for all dimensions, com- 
puted based on the observed module-assignment pairs, for 
multi-dimensional Hawkes model (i.e. Equation [ip, single- 
dimensional Hawkes model and a Poisson model. Specif- 
ically, for single-dimensional Hawkes, we treat all activi- 
ties as in one dimension, and estimate the intensity of each 
dimension using the uni-variate parameters a, 6, and uw. 
For the Poisson model, in each dimension, we use the aver- 
age activity arrival rate as the base rate, and compute the 
RMSE for each dimension respectively. As we can see in 


L A D 
Hawkes (Multi) | 0.56 | 2.34 | 1.37 
Hawkes (Single) | 0.71 | 2.57 | 1.95 
Poisson 3.22 | 6.91 | 3.73 


Table 2: The goodness of fit to true data for each 
model in terms of intensity RMSE. 


Table | Poisson has the worst fit in all dimensions, possi- 
bly caused by the non-Poissonian nature we showed from 
the real data. Single-dimensional Hawkes has comparable 
but slightly worse performance. One possible reason is that 
there might exist differences in terms of base rate and bursti- 
ness between dimensions and by modeling all types of learn- 
ing materials as one activity type, the model can only cap- 
ture the average trend in all dimensions. To visualize how 
the multi-dimensional Hawkes processes fit the real data, we 
also present in Figure |4|the estimated intensity (blue) and 
true intensity (black) of a sample module-student pair. As 
we can see in this figure, the model mostly has a good fit to 
the real data Only at some time points, it underestimates 
the expected number of activities that are about to happen. 


6.2 Model Parameter Analysis: Trends and 


Differences Between Dimensions 
In this section, we analyze the estimated Hawkes parame- 
ters within and between different dimensions, to show their 
general trends and differences across different dimensions. 


We start this part with a correlation analysis of all Hawkes 
Parameters, to show the general trends and possible differ- 
ences between dimensions. Particularly, we calculate the 
Spearman rank correlation coefficients between the parame- 
ters that are learned for all module-student pairs as is shown 
in Figure 5] Recall that parameters Qgq’, ta, 2 and aa re- 
spectively is the between-dimension (or within ifd = d’ ex- 
citation, base rate, decay rate (Equation [2) and aggregated 
influence of dimension d (Equation [3p for d € [L, A, D]. 


We can see that self-excitation within dimensions (i.e.aaa) 
are generally negatively correlated with base rates wa of the 
same dimensions and decays 8. This means that as the ex- 
ternal stimuli leads to more and more expected arrivals, i.e. 
when regular activities come from the base rate, the effect 
of each previous activity on the future ones tends to de- 
crease, i.e. self-excitation gets weaker. In other words, in 


<r 
- Module Lecture  simtuated intensity 


35 + arrival of submission 


Module Assignment 


as Module Discussion 


Figure 4: Estimated and true intensity of a sample 
module-student pair, modeled by multi-dimensional 
Hawkes. 


sequences with higher regularity, less burstiness is observed. 
Also, it means that activities that have a slower decay rate 
usually arrive in a more bursty manner. Mapping to our 
application of students interacting with learning modules, 
by using the branching process point of view, higher a sug- 
gests higher expected number of activities in a latent cluster 
as sub tasks. On the other hand, the negative correlation 
also means a lower base rate and lower number of immi- 
grants. In other words, the number of such latent clusters 
are also fewer. One possible interpretation is that in each 
dimension, students divide their big learning task into sub 
tasks and work for each individual sub task in a relatively 
bursty manner. This also suggests that students barely have 
behaviors that are both highly intense and highly frequent 
(ie. large fra and aga). Similarly, both highly sparse and 
highly mediated activities (i.e. small aq and aaa) are rarely 
observed neither, as the correlation between fq and aaa is 
positive for all d € [L, A, D]. 


Comparing the parameter correlations across different ac- 
tivity types, we can see that dimensions L and D, have 
high within and between-dimension influence correlations. 
For example, the correlation between azzt and aza is 0.97 
in dimension L and correlation between apz and apa is 
0.96. This implies that the influence of discussion and video 
lecture activities on other dimensions are almost consistent. 
For example, if the influence of video lectures on assignments 
is high, it is likely that the influence of video lectures on dis- 
cussions is high too. Similarly, if the pattern of interacting 
with video lectures in a module is bursty (high azz), it is 
likely that other activity types triggered by video lectures 
are also bursty. However, the influence of assignment activ- 
ities on discussions and video lectures are not significantly 
associated with assignment activity’s self-excitement. That 
could mean that after a student has a bursty set of assign- 
ment activities in a module, the student is less likely to have 
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a bursty video lecture activities. Similarly, the influence of 
assignment activities on discussions has a low correlation 
with the influence of assignment activities on video lectures. 
For instance, if a student starts an assignment intensively 
very closely after some intensive watching of video lectures, 
the student is less likely to have high intense discussion ac- 
tivities after assignments. We can also see that assignment- 
triggered activities’ burstinesses are less correlated with the 
base rates (i.e. wap vs. fia). This means that the frequency 
of activities that come from external stimuli does not affect 
the influence of assignment activities on consequent activi- 
ties in other dimensions. Taken altogether, it is interesting 
to see that activities that are associated with assignments 
tend to have different exciting patterns compared to video 
lectures and discussions. This can show the influence of 
deadlines, as the assignments are the only activity type that 
have deadlines and are going to reflect student grades in this 
dataset. 


0.9 0.96 


Cm | 


ALbD 0.96 0.89 0.01 
Ao CC 
AGSES | Som 


-0.0 


Figure 5: Spearsman Rank Correlation Coefficients 
between Hawkes Paramters. 


6.3 Student Behaviors Characterized by Model 


Parameters 

In the previous part, we were interested in showing the cor- 
relation between Hawkes parameters that represented within 
and the between-dimension relationships. In this part of the 
analysis, we focus on the different behaviors that are ob- 
served according to the learned parameters for each module- 
student pair. Additionally, we are interested to see if these 
learned Hawkes parameters are proper representatives for 
student procrastination. To do so, we first define a measure 
that can represent student procrastination in the absence of 
self-reported data. In the following, we go over some impor- 
tant assumptions, definitions and time measures that we use 
for procrastination. 


Defining Delay as a Procrastination Measure. We 
assume that each student works on one module at a time, 
meaning that they do not work on several modules at the 
same time. Furthermore, we assume that submitting the last 
attempt of the module’s assignment marks the end of study- 


Modulei+1 Assignment i 


open deadline 


4 Last submission in module i 
o First activity in A 


© First activity in L I delay. 


O First activity in D 
$ 


Figure 6: Illustration of delay measures. As in Fig- 
ure we use blue, green and yellow dashed lines 
to represents activities from dimension L, A and D 
respectively. 


delayp 


ing this module. According to these assumptions, we define 
module 2’s end time for student j, tf; as the time stamp 
when the last module assignment was submitted. We then 
define the start time ¢j; as the earlier time stamp between 
i-1,j and the available time for module i. In other words, 
when student j finishes learning module i—1, if module 7 has 
already been made available, then their end time on module 
i—1 is defined as the start time for module i. Otherwise, the 
start time is going to be the time when module 7 becomes 
available or is published online. In each dimension d, we use 
tt. to denote action time, which is defined as the time that 
the first activity in dimension d takes place between start 
time tj; and end time t{;. 


Having the module start time and student action time in 
dimension d, we can calculate how late a student started 
working on activity of type d in the module using tg — ti;. 
To factor in the duration differences between different mod- 
ules, we normalize this value by the module duration. Even- 
tually, we define the following measure to quantify student 
j’s normalized delay in dimension d that is associated with 
module i: 


4 a ti; 
delaya = f_E. (6) 
49 aj 

One of the motivations to define the delay according to start 
time tj; is that, sometimes module 7 + 1 is available before 
the assignment deadline in module 7. By this time, student 
j might still be working on module 7. So, it would be un- 
fair to count the time after module 7 + 1 is available and 
before student j’s assignment submission as the procrasti- 
nating time for student 7 on starting module i+ 1. On the 
other hand, if student 7 finishes the assignment in module 7 
earlier than the deadline of the assignment, this extra time 
they earned from the early submission can be used toward 
the next available module. If the student does not use this 
time, it will be considered as a cramming behavior toward 
the next module i+ 1. An illustration of these definitions is 
presented in Figure [6] 


Observing Two Behavior Groups. We now focus on 
the distribution of the learned Hawkes parameters to see 
if we can observe any behavioral patterns across different 
student-module pairs. Specifically, in Figure [7] we present 
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the distribution of the learned azz, aaa, and app. We 
can clearly observe two spikes in the density distribution of 
influence parameters, more prominently in azz and aaa. 
Combining this observation with the correlation analysis 
in previous section, we can see that there are two types 
of module-student interactions: the ones with higher fre- 
quency and lower burstiness versus the ones with lower fre- 
quency and higher burstiness. To statistically show the dif- 
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Figure 7: Density distribution of ajz, aa, ap. 
ference between these two types of interactions, we first clus- 
ter student-modules according to their azz and aa, into 
two groups using the K-means clustering algorithm. Then, 
we test to see if the learned Hawkes parameters, i.e. ex- 
citation parameters aqq’, base rate 4a, decay 2 and aggre- 
gated influence ag for d € [L, A, D], are statistically different 
across the two groups. Particularly, we conduct the Kruskal- 
Wallis test on each learned parameter between the two clus- 
ters. The average values of the parameters for each of the 
two clusters are shown in Table [3] Since the p-values for all 
tests are smaller than 0.0001, we do not show them in the 
table. These small p-values suggest that the differences be- 
tween clusters are statistically significant for all parameters 
between the two types. This indicate that the differences 
between the two types are meaningful. 


Examining both groups more closely, we can see that, the 
aggregated influence of dimension A, i.e. aa, is the high- 
est among all 3 for both type 1 and type 2 groups. With 
that being said, this influence majorly comes from the self- 
excitement in dimension A, i.e. a44. pa is also the highest 
among all 3 dimensions. Combining these observations, we 
can see that assignment-related activities arrives more fre- 
quently and are highly influential in triggering consequent 
activities, especially the assignment-related ones. Also, we 
can see that on average type 2 group has a much smaller base 
rate for video lectures (jz) and discussions (up), meaning 
less density and regularity in those activities compared to 
type 1. However, In assignment-related activities, the base 
rate (wa) as well as aggregated influence in dimension A 
(aa), are higher in type 2 group, which suggests an over- 
all denser and more intense assignment-related activities ar- 
rivals comparing to type 1 group. 


Now if we look at the differences between two groups in 
terms of between-dimension relationships, we can see that 
QaL, ie. the triggering effect of assignment to consequent 
video lecture activities (and similarly, a4p: the assignment- 
triggered discussion activities) is much lower in type 1 group 
compared to type 2 group. This difference is also notable 


in other between-dimension as. For example, the influence 
of assignments on video lectures (a4z) is way less than the 
influence of video lectures on discussions (azp) in type 2 
group, while this difference is less in the type 1 group. This 
suggests that the interaction patterns with assignments in 
type 2 group are almost inconsistent with other dimensions. 
We note that, although the type 1 and type 2 clusters are 
created according to azz, and a4, parameters only, we see 
significant differences in all other parameters of the two 
groups. 


Delay in the Discovered Groups. Here, we aim to 
understand if the two behavior types that we discovered in 
the previous part are associated with measures of procras- 
tination. Particularly, we evaluate the differences observed 
in the delay measures defined in Equation |6| for the two 
clusters. The results are presented in Table Again, all 
p-values are smaller than 0.0001. A major observation is 
that type 1 and type 2 have very different delays in all di- 
mensions. Specifically, the delay of each dimension in type 
1 group is much less than the corresponding delays in type 
2 group. As a result, we can call type 2 group as the de- 
lay group and type 1 group as the non-delay group. Given 
that the type 1 and type 2 behavioral clusters are formed 
based on Hawkes model parameters only, this important ob- 
servation demonstrates that the learned Hawkes parameters 
can clearly represent delay as a procrastination measure. 
Also, we can see that in delay (type 2) group, on average, 
students start the first discussion way after the first assign- 
ment activity takes place (delayp > delays). However, in 
the non-delay (type 1) group, on average the first assignment 
activity happens after some discussion (delayp < delaya). 
We can see that in both groups, the video lecture activities 
come before discussions or assignments. 


Combined with our observations from the previous analysis, 
we see that not only the delay group start the first activity 
in each dimension much later than the other group, they also 
have a much less base rate uz, and 4p. Consequently, we 
can see that the delay group (type 2) has less frequent but 
more bursty discussion and lecture-related activities, while 
the non-delay group activities arrive in a less bursty but 
more frequent manner in these two dimensions. On the 
other hand, assignment activities are denser and more in- 
tense for the delay group. This combined observation shows 
that the Hawkes parameters can represent more informa- 
tion about student procrastination, compared to the delay 
measure alone. 


6.4 Student Grades Associated with Model Pa- 


rameters 

In the previous section, we concluded that the learned Hawkes 
model parameters not only represent delays, but also can 

capture more procrastination-related behaviors. In the rest 

of this section, we are interested in exploring if the additional 

trends captured by the Hawkes model can be more mean- 

ingful in association with student grades, compared to the 

delay parameter. In particular, we are interested in the asso- 

ciation between delay and student grades from the Hawkes 

processes point of view. 


Recall that delaya defined in Equation [6] measures the nor- 
malized delay of the first activity in dimension d of the 
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d QaL dA QaD Qa [ba B 
L | 0.558+0.149 | 0.263+0.178 | 0.289+0.185 | 0.381+0.524 0.0003+0.0006 

Type 1 A | 0.107+0.272 | 0.820+0.125 | 0.101+0.264 | 0.462+0.459 0.00030.002 0.663+0.692 
D | 0.322+0.393 | 0.305+0.393 | 0.790+0.151 | 0.394+0.325 | 5.52E-5+1.8E-4 
L | 0.874+0.108 | 0.823+0.135 | 0.816£0.134 | 0.795+0.582 | 4.82E-5+9.86E-5 

Type 2 A | 0.019-+0.124 | 0.864+0.061 | 0.018+0.124 | 0.799-+0.582 0.0004-£0.004 0.425+0.249 
D | 0.699+0.429 | 0.696+0.430 | 0.936+0.092 | 0.590-+0.238 | 1.19E-5+5.92E-5 


Table 3: Statistics of Hakwes parameters ajq', a, G and ag for d € [L, A, D] in the two clusters. 


delayr delaya delayp d L A D avg. 
Type 1 | 0.080.228 | 0.575+0.411 | 0.338+ 0.385 delay}? -0.339*** | -0.125* | -0.329*** | -0.264** 
Type 2 | 0.1080.274 | 0.722+0.360 | 0.819+0.337 delaya -0.240** | -0.070. -0.114* -0.141* 


Table 4: Statistics of delay measures delay,, delays 
and delayp in two clusters identified by Hawkes pa- 
rameters. 


student-module pair. Here, we define a new delay measure 
based on both learned parameters of the Hawkes process and 
delaya. We then study if this newly defined delay measure 
performs better in association with student grades, com- 
pared to delayg. Specifically, after showing the between- 
dimension excitation interrelationships, it is reasonable to 
assume that these interrelationships are important in the ac- 
tivity delays as well. For example, knowing that assignment- 
related activities can trigger followup activities in all 3 di- 
mensions, delaying the assignment-related activities also po- 
tentially causes consequent delays in other dimensions. Mo- 
tivated by this, we propose delay?’ by combining delayg and 
between-dimension Hawkes parameters as follows: 

delay} = delaya 4 : 7 (7) 

dia za I-agg 


statistically expected number of activities in a latent cluster 
that are triggered by an immigrant. The second term in 
Equation |7| basically quantifies the potential loss per time 
unit in terms of triggering other dimensions’ activities by 
delaying in dimension d. Taken altogether, delay’! describes 
the total delays in all 3 dimension that are associated with 
delay in dimension d. 


As we mentioned in Section |5 can be seen as the 


To see if delay’? provides more grade-related information 
compared to delaya, we look at the Spearman’s correla- 
tion between these two measures and students’ assignment 
grades. The result of this correlation analysis is presented 
in Table Our first observation is that the correlations 
between both delay measures with student grade are neg- 
ative. However, this correlation is not as significant for 
delayg, compared to delay#. This is specially stronger in 
the assignment dimension. The reason for this can be two- 
fold: (1) comparing to delaya, delay} not only captures 
how late the action was taken in each dimension, it also pro- 
vides some insights on the student behavior trends through- 
out their learning process, and (2) as delay}? describes the 
time-dependencies between dimensions, it is more power- 
ful in explaining student activities in all dimensions as a 
whole, compared to the point estimate summaries of pro- 
crastination. Particularly, one may overlook the importance 
of delaying the discussion-related activities on assignment 
grades when considering the delayp measure only. However, 
a stronger correlation between delay}, and grades suggests 
that early start of the discussion-related activities is almost 
equally important as starting the video lectures early, prob- 
ably because of the triggering effect of dimension D and the 


Table 5: Spearman’s correlation with respect to as- 
signment score for each delay measure. Significance 
level is denoted as follows: p<0.001*** p<0.01 ** 
p<0.05* p<0.1° 


potential loss that its delay causes to all 3 dimensions. 


7. CONCLUSION 


In this work, we proposed to use the multi-dimensional Hawkes 
processes to model procrastination in student learning be- 
havior. We showed that multi-dimensional Hawkes processes 
have a better fit to student activity counts in comparison 
with their single-dimensional version and the Poisson pro- 
cesses. By analyzing the correlations between the learned 
parameters in the Hawkes processes, we concluded that more 
bursty student sequences have less regular activities in them, 
the burstiness of video lecture and discussion-related activ- 
ities vary similar to each other, and the deadlines highly 
affect the arrival times of assignment-related activities. We 
showed that Hawkes parameters can reveal two types of be- 
haviors in the data that are associated with different delays 
- the delay group tends to have high within and between- 
dimension excitation but low base rate, and the non-delay 
group have a high base rate and a lower excitation in all 
dimensions. According to the branching processes point of 
view, we gave a realistic interpretation on these types of 
behaviors: non-delay group divide big tasks into many sub- 
tasks (high base rate) which leads to more frequent and less 
dense activities throughout the learning process. On the 
other hand, delay group tend to intensively work in one di- 
mension for a shorter period of time, followed by long pauses 
(high excitation but low base rate). We also showed that the 
Hawkes model parameters represent richer information com- 
pared to the delay measure alone by defining a new Hawkes- 
based delay measure and associating it with student grades. 
Our experiments demonstrated that the between-dimension 
dependencies in the multi-dimensional Hawkes model better 
explain student grades. 


This study is limited in the number and variety of the datasets 
that we have experimented on. In the future, we plan to ex- 
plore more datasets from various disciplines and platforms. 
Another limitation is the single measure that we use to eval- 
uate procrastination (delaya). As a followup to this study, 
we aim to define and use more procrastination indicators, 
including the self-reported procrastination measures. 
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